Snohomish County
Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions
Soroka, Emi, Chopra, Tanmay, Desai, Krish, Lall, Sanjay
Large language models (LLMs) have seen increasing popularity in enterprise applications where AI agents and humans engage in objective-driven interactions. However, these systems are difficult to evaluate: data may be complex and unlabeled; human annotation is often impractical at scale; custom metrics can monitor for specific errors, but not previously-undetected ones; and LLM judges can produce unreliable results. We introduce the first set of unsupervised metrics for objective-driven interactions, leveraging statistical properties of unlabeled interaction data and using fine-tuned LLMs to adapt to distributional shifts. We develop metrics for labeling user goals, measuring goal completion, and quantifying LLM uncertainty without grounding evaluations in human-generated ideal responses. Our approach is validated on open-domain and task-specific interaction data.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (8 more...)
- Banking & Finance > Insurance (1.00)
- Health & Medicine > Health Care Providers & Services (0.93)
- Health & Medicine > Therapeutic Area (0.67)
- Europe > Moldova (1.00)
- Asia > Middle East > Israel (0.68)
- Atlantic Ocean (0.45)
- (20 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Hockey (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- (16 more...)
HistoryBankQA: Multilingual Temporal Question Answering on Historical Events
Mandal, Biswadip, Khandelwal, Anant, Gupta, Manish
Temporal reasoning about historical events is a critical skill for NLP tasks like event extraction, historical entity linking, temporal question answering, timeline summarization, temporal event clustering and temporal natural language inference. Yet efforts on benchmarking temporal reasoning capabilities of large language models (LLMs) are rather limited. Existing temporal reasoning datasets are limited in scale, lack multilingual coverage and focus more on contemporary events. To address these limitations, we present HistoryBank, a multilingual database of 10M+ historical events extracted from Wikipedia timeline pages and article infoboxes. Our database provides unprecedented coverage in both historical depth and linguistic breadth with 10 languages. Additionally, we construct a comprehensive question answering benchmark for temporal reasoning across all languages. This benchmark covers a diverse set of 6 temporal QA reasoning tasks, and we evaluate a suite of popular language models (LLaMA-3-8B, Mistral-7B, Gemma-2-9b, Qwen3-8B, GPT4o) to assess their performance on these tasks. As expected GPT4o performs best across all answer types and languages; Gemma-2 outperforms the other small language models. Our work aims to provide a comprehensive resource for advancing multilingual and temporally-aware natural language understanding of historical events. To facilitate further research, we will make our code and datasets publicly available upon acceptance of this paper.
- Leisure & Entertainment > Sports (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Media (0.68)
- Law (0.67)
SPIRA: Building an Intelligent System for Respiratory Insufficiency Detection
Ferreira, Renato Cordeiro, Gomes, Dayanne, Tamae, Vitor, Wernke, Francisco, Goldman, Alfredo
Respiratory insufficiency is a medic symptom in which a person gets a reduced amount of oxygen in the blood. This paper reports the experience of building SPIRA: an intelligent system for detecting respiratory insufficiency from voice. It compiles challenges faced in two succeeding implementations of the same architecture, summarizing lessons learned on data collection, training, and inference for future projects in similar systems.
- South America > Brazil > São Paulo (0.07)
- North America > United States > Washington > Snohomish County > Lynnwood (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Anchored Diffusion Language Model
Rout, Litu, Caramanis, Constantine, Shakkottai, Sanjay
Diffusion Language Models (DLMs) promise parallel generation and bidirectional context, yet they underperform autoregressive (AR) models in both likelihood modeling and generated text quality. We identify that this performance gap arises when important tokens (e.g., key words or low-frequency words that anchor a sentence) are masked early in the forward process, limiting contextual information for accurate reconstruction. To address this, we introduce the Anchored Diffusion Language Model (ADLM), a novel two-stage framework that first predicts distributions over important tokens via an anchor network, and then predicts the likelihoods of missing tokens conditioned on the anchored predictions. ADLM significantly improves test perplexity on LM1B and OpenWebText, achieving up to 25.4% gains over prior DLMs, and narrows the gap with strong AR baselines. It also achieves state-of-the-art performance in zero-shot generalization across seven benchmarks and surpasses AR models in MAUVE score, which marks the first time a DLM generates better human-like text than an AR model. Theoretically, we derive an Anchored Negative Evidence Lower Bound (ANELBO) objective and show that anchoring improves sample complexity and likelihood modeling. Beyond diffusion, anchoring boosts performance in AR models and enhances reasoning in math and logic tasks, outperforming existing chain-of-thought approaches
- Africa > Middle East > Egypt (0.04)
- North America > United States > Oklahoma > Oklahoma County > Oklahoma City (0.04)
- Asia > Middle East > Jordan (0.04)
- (11 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Government (1.00)
- Leisure & Entertainment > Sports > Basketball (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Reliable, Routable, and Reproducible: Collection of Pedestrian Pathways at Statewide Scale
Zhang, Yuxiang, Howe, Bill, Caspi, Anat
While advances in mobility technology including autonomous vehicles and multi-modal navigation systems can improve mobility equity for people with disabilities, these technologies depend crucially on accurate, standardized, and complete pedestrian path networks. Ad hoc collection efforts lead to a data record that is sparse, unreliable, and non-interoperable. This paper presents a sociotechnical methodology to collect, manage, serve, and maintain pedestrian path data at a statewide scale. Combining the automation afforded by computer-vision approaches applied to aerial imagery and existing road network data with the quality control afforded by interactive tools, we aim to produce routable pedestrian pathways for the entire State of Washington within approximately two years. We extract paths, crossings, and curb ramps at scale from aerial imagery, integrating multi-input segmentation methods with road topology data to ensure connected, routable networks. We then organize the predictions into project regions selected for their value to the public interest, where each project region is divided into intersection-scale tasks. These tasks are assigned and tracked through an interactive tool that manages concurrency, progress, feedback, and data management. We demonstrate that our automated systems outperform state-of-the-art methods in producing routable pathway networks, which then significantly reduces the time required for human vetting. Our results demonstrate the feasibility of yielding accurate, robust pedestrian pathway networks at the scale of an entire state. This paper intends to inform procedures for national-scale ADA compliance by providing pedestrian equity, safety, and accessibility, and improving urban environments for all users.
- North America > United States > Washington > Snohomish County (0.04)
- North America > United States > Washington > King County (0.04)
- North America > United States > Oregon > Multnomah County (0.04)
- (3 more...)
- Transportation > Infrastructure & Services (0.66)
- Transportation > Ground > Road (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.67)
- Information Technology > Communications > Social Media > Crowdsourcing (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Urban Mobility Assessment Using LLMs
Bhandari, Prabin, Anastasopoulos, Antonios, Pfoser, Dieter
Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas. A challenging aspect of this work is the collection of mobility data by means of user tracking or travel surveys, given the associated privacy concerns, noncompliance, and high cost. This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), aiming to leverage their vast amount of relevant background knowledge and text generation capabilities. Our study evaluates the effectiveness of this approach across various U.S. metropolitan areas by comparing the results against existing survey data at different granularity levels. These levels include (i) pattern level, which compares aggregated metrics like the average number of locations traveled and travel time, (ii) trip level, which focuses on comparing trips as whole units using transition probabilities, and (iii) activity chain level, which examines the sequence of locations visited by individuals. Our work covers several proprietary and open-source LLMs, revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data, and as such provides an argument for using such data in mobility studies.
- North America > United States > California > San Francisco County > San Francisco (0.15)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Transportation (0.93)
- Information Technology (0.86)
- Government > Regional Government > North America Government > United States Government (0.67)
The fatal mistake a Tesla driver made before killing 'kind and outgoing' 28-year-old in Washington
Authorities have confirmed that a Tesla on autopilot was partly responsible for a crash in Washington that killed a motorcyclist . Jeffrey Nissen, 28, was traveling about 15 miles northeast of Seattle when a Model S came from behind and rammed him off his bike before running him over. Investigators from the Washington State Patrol found the Tesla driver was operating on the company's'Full Self Driving' (FSD) and had looked at his cell phone while the vehicle was moving. Nissen was found under the car and pronounced dead at the scene, authorities reported. The 56-year-old driver was arrested for investigation of vehicular homicide.
- North America > United States > Washington > King County > Seattle (0.40)
- North America > United States > Washington > Snohomish County (0.05)
- North America > United States > Colorado (0.05)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.71)
Data-Driven Ergonomic Risk Assessment of Complex Hand-intensive Manufacturing Processes
Krishnan, Anand, Yang, Xingjian, Seth, Utsav, Jeyachandran, Jonathan M., Ahn, Jonathan Y., Gardner, Richard, Pedigo, Samuel F., Adriana, null, Blom-Schieber, null, Banerjee, Ashis G., Manohar, Krithika
Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic issues related to hand-intensive manufacturing processes. The system comprises a multi-modal sensor testbed to collect and synchronize operator upper body pose, hand pose and applied forces; a Biometric Assessment of Complete Hand (BACH) formulation to measure high-fidelity hand and finger risks; and industry-standard risk scores associated with upper body posture, RULA, and hand activity, HAL. Our findings demonstrate that BACH captures injurious activity with a higher granularity in comparison to the existing metrics. Machine learning models are also used to automate RULA and HAL scoring, and generalize well to unseen participants. Our assessment system, therefore, provides ergonomic interpretability of the manufacturing processes studied, and could be used to mitigate risks through minor workplace optimization and posture corrections.
- North America > United States > Washington > King County > Seattle (0.14)
- Europe > United Kingdom (0.04)
- Asia > India (0.04)
- (7 more...)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.61)
Titan submersible recovery efforts continue with help of remotely operated vehicle
Navy SEAL Jake Zweig responds to the intense search for the missing Titanic submarine on'Fox & Friends.' Efforts to recover the remains of the Titan submersible that suffered a catastrophic implosion near the Titanic wreckage are currently underway, and as of Sunday, had descended to the seafloor for a fourth dive. Last Thursday, the U.S. Coast Guard confirmed that a debris field located about 1,600 feet from the wreckage of the Titanic was in fact that of the missing Titan submersible. The underwater vessel was carrying five men on board when it lost contact with its surface ship about an hour and 45 minutes after descending to the Titanic. South Wellfleet, Massachusetts-based Pelagic Research Services (PRS) was contacted by OceanGate, the company behind Titan, for use of its remotely operated vehicles, or "ROVs," to assist with the search. Pelagic Research Services continues to assist the Transportation Safety Board of Canada, U.S. Coast Guard, and U.S. National Transportation Safety Board with Titan recovery efforts near the Titanic wreckage.
- North America > Canada (0.39)
- North America > United States > Massachusetts (0.26)
- North America > United States > Washington > Snohomish County > Everett (0.06)